Apache Impala
   HOME

TheInfoList



OR:

Apache Impala is an
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized so ...
massively parallel processing (MPP) SQL query engine for data stored in a
computer cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The comp ...
running
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage a ...
. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.


Description

Apache Impala is a query engine that runs on Apache Hadoop. The project was announced in October 2012 with a public
beta test A software release life cycle is the sum of the stages of development and maturity for a piece of computer software ranging from its initial development to its eventual release, and including updated versions of the released version to help impro ...
distribution and became generally available in May 2013. Impala brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in
HDFS Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
and
Apache HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sys ...
without requiring data movement or transformation. Impala is integrated with Hadoop to use the same file and data formats, metadata, security and resource management frameworks used by
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filtering ...
, Apache Hive,
Apache Pig Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programm ...
and other Hadoop software. Impala is promoted for analysts and data scientists to perform analytics on data stored in Hadoop via SQL or
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical ...
tools. The result is that large-scale data processing (via MapReduce) and interactive queries can be done on the same system using the same data and metadata – removing the need to migrate data sets into specialized systems and/or proprietary formats simply to perform analysis. Features include: *Supports
HDFS Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
, S3, ABFS,
Apache HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sys ...
and
Apache Kudu Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides completeness to Hadoop's storage layer to enable ...
storage, *Reads Hadoop file formats, including text, LZO, SequenceFile,
Avro AVRO, short for Algemene Vereniging Radio Omroep ("General Association of Radio Broadcasting"), was a Dutch public broadcasting association operating within the framework of the Nederlandse Publieke Omroep system. It was the first public broa ...
, RCFile, Parquet and ORC *Supports Hadoop security ( Kerberos authentication,
Ldap The Lightweight Directory Access Protocol (LDAP ) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory servi ...
), *Fine-grained, role-based authorization with Apache Sentry and Apache ranger *Uses metadata,
ODBC In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An ...
driver, and SQL syntax from Apache Hive. In early 2013, a column-oriented file format called Parquet was announced for architectures including Impala. In December 2013,
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide d ...
announced support for Impala. In early 2014, MapR added support for Impala. In 2015, another format called Kudu was announced, which
Cloudera Cloudera, Inc. is an American software company providing enterprise data management systems that make significant use of Apache Hadoop. As of January 31, 2021, the company had approximately 1,800 customers. History Cloudera, Inc. was formed on J ...
proposed to donate to the
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the ...
along with Impala. Impala graduated to an Apache Top-Level Project (TLP) on 28 November 2017.


See also

*
Apache Drill Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dre ...
— similar open source project inspired by Dremel * Dremel — similar tool from Google *
Trino Trino ( pms, Trin) is a '' comune'' (municipality) in the Province of Vercelli in the Italian region Piedmont, located about northeast of Turin and about southwest of Vercelli, at the foot of the Montferrat hills. Trino borders the following mu ...
— open source SQL query engine created by the creators of Presto * Presto — open source SQL query engine created by Facebook and supported by
Teradata Teradata Corporation is an American software company that provides cloud database and analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech ...


References


External links


Apache Impala
project website
Impala GitHub
project source code {{DEFAULTSORT:Impala
Impala The impala or rooibok (''Aepyceros melampus'') is a medium-sized antelope found in eastern and southern Africa. The only extant member of the genus ''Aepyceros'' and tribe Aepycerotini, it was first described to European audiences by Ger ...
Cloud platforms Free system software Hadoop 2013 software